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Purpose: The average mortality rate for death by suicide among OECD countries is 
12.8 per 100000, and 33.5 for Korea. The present study analyzed big data extracted 
from Google to identify factors related to searches on suicide in Korea. Materials 
and Methods: Google search trends for the search words of suicide, stress, exercise, 
and drinking were obtained for 2004-2010. Analyzing data by month, the relation- 
ship between the actual number of suicides and search words per year was examined 
using multi-level models. Results: Both suicide rates and Google searches on suicide 
in Korea increased since 2007. An unconditional slope model indicated stress and 
suicide -related searches were positively related. A conditional model showed that 
factors associated with suicide by year directly affected suicide-related searches. The 
interaction between stress-related searches and the actual number of suicides was sig- 
nificant. Conclusion: A positive relationship between stress- and suicide-related 
searches further confirmed that stress affects suicide. Taken together and viewed in 
context of the big data analysis, our results point to the need for a tailored prevention 
program. Real-time big data can be of use in indicating increases in suicidality when 
search words such as stress and suicide generate greater numbers of hits on portals 
and social network sites. 



Key Words: Internet, suicide, prevention and control, psychological stress, statis- 
tical models 
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In 2010, the suicide rate in Korea was 33.5 per 100000, far above the average of 
12.8 per 100000 reported in Organization for Economic Co-operation and Develop- 
ment (OECD) countries. 1 With an increase of 101.8% between 2000 and 2010, Ko- 
rea has experienced the highest increase in suicide rate among OECD countries. 
Additionally, suicide rates among teenagers and the elderly were reported to be ex- 
traordinarily high compared to other OECD countries. 2 The reported causes of sui- 
cide in Korea include economic issues, family problems, depression, and anxiety 
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about the future. According to one survey, 15.6% of adults 
from the general population seriously considered suicide at 
least once in their lives. 3 In 2007, data obtained from the Na- 
tional Statistical Office reported that 56.5% of teenagers 
were distressed by heavy study loads and that 95% of em- 
ployees were suffering from stress. 4 

The Korean Statistical Information Service defines sui- 
cide as "intentional self-harm." As well, "accidental poison- 
ing" or "poisoning with undetennined intent" can also be 
deemed to fall under the category of suicide. 5 Rather than 
being a one-time event, suicide is actually a series of steps 
or a process that may include "suicidal ideation— >suicide 
planning— >suicidal attempts— >suicidal behavior", 6 thus in- 
tervention to prevent this series of suicidal processes is 
needed. 5 - 7 

To date, research on suicide has focused on psychological, 
biological, medical, and socio-environmental factors in or- 
der to determine risk factors and causes. As noted above, 
suicide is generally the result of a compounding of social 
and psychological stressors associated with depression or al- 
coholism. 8 Such disorders have been shown to progress neg- 
atively over time, resulting in suicide. 910 Hence, this study 
was conducted within a theoretical framework based on the 
stress-vulnerability model in an attempt to explicate the 
causes of suicide in Korean adults. 

Being exposed to extreme amounts of stress, either acute 
or chronic, can threaten not only an individual's capabilities 
or resources, but also the person's well-being; such threats 
are signs of suicidal ideation and suicidal behaviors. 916 Re- 
portedly, suicidal behaviors are a result of an inability to han- 
dle what is requested or expected in an individual's life, 
which stems from the stress that arises from the perception 
of such a dire reality. 1017 Even though individuals may ex- 
perience stress of the same intensity, its effects with regard 
to suicidal behaviors vary between individuals. Many stud- 
ies have examined potential factors that may mediate the 
relationship between stress and suicidal behaviors: cogni- 
tive factors, 1014 such as self- worth, family support, and so- 
cial support, as well as lifestyle factors, 1618 19 such as smok- 
ing, drinking, physical exercise, and nutrition, have been 
reported in the literature. 

Factors related to lifestyle and problematic drinking be- 
haviors are especially reported to be associated with higher 
odds of abusing other addictive substances and provoking 
suicidal impulses in situations where it is difficult to main- 
tain self-control. 18 20 Among alcoholics, suicidal attempt 
rates are 10 times higher than those of their non-alcoholic 



counterparts. 20 On the contrary, among lifestyle -related be- 
haviors associated with better health, exercise is reported to 
be a protective factor for preventing diseases or facilitating 
recovery from diseases, as well as for maintaining and pro- 
moting optimal mental health. 

News from newspapers, broadcasts, and the Internet fo- 
cusing on suicide may popularize suicides. 21 Celebrity sui- 
cides, in particular, which usually garner detailed news cov- 
erage, can cause the Werther Effect. 21 ' 25 It has been reported 
that this can increase the risk of suicide up to five times or 
even 14.3 times. 22 - 24 In Korea, the number of suicides in 
March 2005 was 1309, which was after a celebrity had just 
committed suicide, almost twice that of the previous month 
(736 cases). 26 According to a study, 23 after a celebrity sui- 
cide case in 2008, a much higher number of individuals 
who had attempted suicide using the same method as the 
celebrity used were admitted to the emergency room com- 
pared to the previous period. 

Also, among cases of divorce or bereavement, the suicide 
rate is higher than that of their counterparts. 27 As a result of 
analyzing the association of suicide and socio-economic 
variables such as divorce rate, birth rate, female labor force 
participation rate, migration, income, education level, etc., a 
high level of urbanization was shown to be associate with 
lower rates of suicide. 28 - 29 

As mentioned above, establishment of systematic strate- 
gies to prevent suicides, as well as conducting studies that 
include macroscopic variables (time sequential, spatial) that 
consider social influences, is needed. It is difficult; howev- 
er, to respond effectively in the early stages of suicidal be- 
havior, particularly where social and psychological factors 
have an extensive impact. Thus, it may prove difficult to es- 
tablish a plan for predicting and managing suicide because 
of the complexity of suicidal impulses. In such situations, 
analysis of big data may be effective in managing suicide at 
the national level. 30 - 31 

Big data can be understood as extremely large data sets, 
which cannot be collected, stored, managed, and analyzed 
via conventional approaches by the use of, for instance, da- 
tabase management systems. In public sectors, big data has 
been used in disease prevention, prediction, treatment, and 
patient management through sharing genetic and biological 
resources. 32,33 

Over the recent years, the amount of data transmitted via 
smart devices and social network services has exponential- 
ly increased. Currently, data are recognized as a economic 
assets: 3234 By analyzing big data, the Obama administration 
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and other government bodies, as well as even multi-nation- 
al information technology companies all over the world, are 
able to generate meaningful and useful information. 34 

For example, the National Institutes of Health in the U.S. 
reduced their expenses by $50 million per year by means of 
a Pillbox service that employs two-way interaction between 
manufacturers and users to provide information on various 
medications at the users' request. Google provides a real- 
time flu forecast service by analyzing the spread of the flu 
worldwide through analysis of search words by users on 
the web. Such applications of big data and the development 
of analysis methods may allow for more accurate predic- 
tions of various aspects of our society. For this reason, big 
data is recognized as a cost-effective resource. 33 

The existing methods for identifying suicide related fac- 
tors, such as questionnaire surveys and clinical studies, 
have the advantage of enabling investigation of related 
variables based on data obtained from an individual; how- 
ever, these also involve limitations. The magnitude of the 
association between suicide and these investigated vari- 
ables remains to be clarified. Suicide is a complex and so- 
cial phenomenon, and its diffusion effect is macroscopic 
under both time sequential and spatial aspects, as suggested 
by the Werther Effect, which highlights the importance of 
applying big data to the analysis of suicide related risk fac- 
tors. Nevertheless, no studies have analyzed big data relat- 
ed to suicide. The purposes of this study were to utilize big 
data obtained from Google statistics and to analyze the de- 
terminants of suicide-related searches by means of a multi- 
dimensional analysis. 



MATERIALS AND METHODS 



Research design and keyword search 

The research design of this study was based on reviewed lit- 
erature demonstrating that stress is closely related to depres- 



Yearly characteristic 



• Yearly Suicide Rate (YSR) 
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Suicide Search 
Volume (SSV) 



• STress Search Volume (STSV) 

• Drinking Search Volume (DSV) 

• Exercise Search Volume (ESV) 



Fig. 1. Research model. 



sion and suicide, and that lifestyle factors related to health 
have mediating effects on the relationship between stress 
and suicide. This study, therefore, was designed to analyze 
the determinants of the number of suicide-related searches 
in Korea using big data. The institutional review board of 
Inje University Seoul Paik Hospital approved this research. 

This study employed Google search trends, a method that 
utilizes analyses of search- word inputs by users worldwide 
to provide standardized statistics for the number of searches 
on a specific search word carried out in a specific region at 
a specific time extracted according to a certain conditions. 
The Google search words used for this study, both English 
and Korean, included "stress" and "ij=L5]]:i"; "drinking" 
and "^r"; "exercise" and "£^"; and lastly, "suicide" 
and "*HF. 

By means of a multi-dimensional analysis, an attempt 
was made to discover whether suicide rate by year, unem- 
ployment rate, after reports of a celebrity suicide, and the 
number of searches by month for stress, drinking, and exer- 
cise were compared with the number of searches on suicide 
(Fig. 1). In conducting the study, the following research 
questions were assessed: 

1) Is there a difference between the amount of stress of the 
Korean people and the number of suicide -related searches 
by year? 

2) Do monthly factors (stress, drinking, and exercise ac- 
cording to number of related searches) affect suicide-relat- 
ed search numbers? 

3) Do monthly factors and yearly factors of Korean peo- 
ple affect suicide -related search numbers? 

4) Is there an interaction between monthly factors and 
yearly factors on the number of suicide-related searches in 
Korea? 

5) Do suicide-related search numbers affect the stress-re- 
lated search numbers in Korea? 

Data analysis 

This study analyzed the relationship between the number of 
searches for the four keywords mentioned above at a pri- 
mary level (month) and Korea's suicide rate, unemploy- 
ment rate, and after reports of a celebrity suicide or not at a 
secondary level (year) from January 1, 2004 to December 
31, 2010, building a multi-level model (Fig. 2). In the fig- 
ure, the peak refers to the point of suicide incident when a 
famous politician committed suicide in Oct. 2010. 

A multi-level model is commonly called a hierarchical 
linear model (HLM), which is used when predictions are to 
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be made of dependent variables not only with primary-level 
variables but also with higher ranked variables. A multi-lev- 
el model can reflect all varieties and characteristics of both 
lower and higher levels of data collection. 35 

To estimate parameters in this study, restricted maximum 
likelihood was used, which considers decreases in the degree 
of freedom in a fixed effect during the process of estimating 
the variance in a random effect. 33 For the final estimation of 
the fixed effect, a robust standard error was applied, which 
does not assume that the distribution of dependent variables 
is a normal distribution. Stress, exercise, and drinking-related 
searches at the primary level were entered as group means 
and suicide rates at the secondary level as the grand mean. 
Then, the variables were entered into the model. SPSS soft- 
ware, version 20.0 (SPSS Inc., Chicago, IL, USA), was 
used for the descriptive statistics analysis, and HLM soft- 
ware, version 7.0 (SSI Inc., Chicago, IL, USA), was used 
for the multi-level model analysis. The basic form of HLM 
is as follows: 

<Level 1 model> 

Y$: Suicide search numbers per month i of year j 
Pa,: Intercept of year j 

fixf. Regression coefficient of variable X for year j 

Xif. Independent variables (stress, drinking, and exercise- 
related search numbers) in month i of year j 

rf. Residual at level 1 (month) that is not explained by 
level 1 prediction variables owing to random effects in 
month/ of year/ 

<Level 2 model> 

p ir Y w +Y u Wj+uy 

Yoo, Y\o- Level 2, in other words, year model's intercept 
T m , T n : Regression coefficient at level 2 
Wf. Prediction variables at level 2 

uoj, u\j. Residual by year that did not explain the charac- 
teristics of level 2 (year) due to random effects at level 2. 

For a multi-dimensional analysis to examine the determi- 
nants of suicide-related searches, the following four models 
were employed in this study: 

Model 1 (Basic model): SSVi/=T m +uoj+r^ 

Model 2 (Unconditional slope model): 

ssv r r w 
+r l0 *sTsv iJ 
+r 20 *Dsv ij 
+r m *Esv iJ 

+uo i +uy*STSV iJ +U2j*DSV i j+U3j*ESV i j+r i j 



100 


Oct. 2010 
*SSV=100 




80 






60 
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Fig. 2. Suicide-related search results from Google Trends (http://www. 
google.co.kr/trends/) in Korea. The axis of X refers to month by year. The 
axis of Y refers to suicide search volume. *Peak refers to the point of sui- 
cide incident when a famous politician committed suicide. 

Model 3 (Condition model): 

ssv r r M +r 0l *YSRj 
+t 10 *stsv,j 
+r 20 *Dsv iJ 
+r 30 *Esv iJ 

+uoi+mj*STSVii+ii2j*UER+m J *CES+r i j 
Model 4 (Interaction model): 

ssv, r r m +r m *YSRj 

+r m *STSV iJ +r n *YSRj*STSV l j 

+r 20 *Dsv iJ 
+r 30 *Esv iJ 

+ii0j+uij*STSV i j+r ij 



RESULTS 



Descriptive statistics for major study variables 

A descriptive analysis was conducted to test the normality 
of variables (Table 1). Skewness and kurtosis appeared to 
meet the normality assumptions. 36 The suicide rate in Korea 
showed an increasing trend, and the volume of Google 
searches related to suicide showed a trend similar to the ac- 
tual suicide rate of Korea. In particular, the volume of sui- 
cide-related searches increased in 2005, 2008, and 2010 af- 
ter reports of celebrity suicides, indicating a risk for copycat 
suicides (Fig. 3). Table 2 shows the suicide rates and month- 
ly suicide-related search volume (SSV) of Korea and other 
OECD countries. For Korea, both "suicide" and 
were used as search words. 

Multi-level model analysis 

The results of a multi-dimensional analysis for the determi- 
nants of suicide searches are shown in Table 3. By analyz- 
ing yearly level variance regarding monthly SSV when in- 
dependent variables were not entered in the examination of 
research question 1, Model 1 was employed to test if there 
is a difference in SSV by year through a multi-level analy- 
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Table 1. Descriptive Statistics for Study Variables 



(unit: %, search volume) 



Yr 


Suicide 


Stress 




Rinking 




Exercise 




(Suicide rate) 


Mean±SD 


K 


S Mean±SD 


K 


S 


Mean±SD 


K 


S 


Mean±SD 


K 


S 


2004 (29.5) 


45.2±8.3 


-1.86 


-0.22 116.8±32.9 


0.0 


0.50 


74.6±23.0 


2.04 


-1.11 


135.6±24.2 


-0.92 


-0.55 


2005 (29.9) 


58.9±18.5 


lie 


l.il 139.2±15.o 


0.39 


1.28 


/5.5±2U.5 


O./O 


1.11 


1/11 0_I_T7 C 

14J.5±2/.5 


1/1/1 
-1.44 


-0.41 


2006 (26.2) 


42.2±7.2 


-1.03 


-0.55 124.5±24.9 


-0.32 


U.Jo 


74.3±16.9 


-1.40 


U.ZJ 


128.4±17.2 


-0.64 


U.Uj 


2007 (28.7) 


47.3±7.9 


-0.62 


-0.14 99.5±14.4 


-0.35 


-0.53 


64.0±9.4 


-0.40 


-0.01 


96.8±14.4 


-0.18 


-0.21 


2008 (29.9) 


55.8±22.3 


6.93 


2.52 99.9±15.2 


-1.76 


-0.19 


74.3±8.4 


-0.98 


0.02 


104.2±16.8 


-0.41 


-0.68 


2009 (33.8) 


58.8±19.1 


4.54 


1.96 109.0±25.0 


-0.43 


-0.55 


85.8±18.4 


0.43 


0.87 


115.3±21.0 


-1.12 


-0.38 


2010(33.5) 


78.3±20.1 


2.34 


1.16 106.8±19.9 


-1.37 


0.22 


92.5±21.7 


-0.57 


0.09 


116.3±22.4 


-0.86 


0.60 


<, Kurtosis; S, Skewness. 






















Table 2. Suicide Rate and Suicide-Related Search Volume among OECD Countries 






(unit: %, search volume) 


Country 


2005 


yr 


2006 yr 


2007 yr 


2008 


yr 


2009 yr 


2010 


yr 


Suicide* 


ssv f 


Suicide* SSV* 


Suicide* 


ssv f 


Suicide* 


ssv f 


Suicide* 


SSV Suicide* 


ssv f 


United States 


11.2 


74.2 


11.3 64.2 


11.7 


59.7 


12.0 


61.0 




57.5 




57.4 


United Kingdom 6.7 


87.0 


6.7 77.0 


6.3 


63.8 


6.9 


67.0 


6.8 


56.8 


6.7 


53.4 


Australia 


10.3 


79.1 


10.4 69.6 


10.8 


60.2 


10.8 


54.5 


10.5 


49.9 


10.6 


51.0 


South Korea 


29.9 


58.9 


26.2 42.2 


28.7 


47.3 


29.0 


55.8 


33.8 


58.8 


33.5 


78.3 



OECD, Organization for Economic Co-operation and Development. 
'Suicide rates per 100,000 (OECD Health Data, 2012). 

f SSV refers to Monthly Suicide Search Volume, which means the number of searches on suicide compared to the total number of searches performed on 
Google (likelihood for search in a specific region at a specific time). 
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Fig. 3. Suicide rate and suicide-related search volume in Korea. 

sis. As a result of the fixed effect analysis, the probability 
that the number of Google searches in Korea per month 
would reach an average of 55.20 times was statistically sig- 
nificant (P=55.20, /?<0.001). As a result of the random ef- 
fect analysis, both the monthly level variance (5 2 =256.61) 
and yearly level variance (8 2 = 127.98) appeared to be statis- 
tically significant (X 2 =41.91,/?<0.001). 

The calculation of the variance ratio of yearly SSV through 
an intra-class correlation coefficient (ICC), which shows 
similarity among the lower levels belonging to the same 
level, yielded the following results: 











/78.3 




-^58.9\ 
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42.2 


47.3 
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l 
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0 



Variance ratio that is explained by a difference in level 2 
(yearly) 

=[Level 2 (year) variance]/[Level 1 (month) variance+ 
Level 2 (year) variance] 
=127.98/(256.61+127.98) 
=0.33 

This showed that yearly level variance accounted for about 
33.2% of the total variance explained regarding monthly 
SSV; consequently, the monthly level variance was shown 
to make up about 66.8% of the total variance explained. 



258 



yonsei Med J http://www.eymj.org Volume 55 number 1 january2014 



Google Search Trends for a Big Data Analysis 



Table 3. Multi-Level Model Analysis of Suicide-Related Searches 









Model 








Parameter 


Model 1 
Unconditional model 


Model 2 
Unconditional Slope model 


Model 3 
Conditional model 


Model 4 
Interaction model 


Fixed effect 


Coef. S.E. 


Coef. 


S.E. 


Coef. 


S.E. 


Coef. 


S.E. 


Level 1 


Intercept (Too) 


55.20 4.28 (pO.001) 


55.20 


4.28(p<0.001) 


55.93 


36.16 (p=0.220) 






STSV 




0.38 


0.17(p=0.071) 


0.24 


0.15 (p=0.161) 






DSV 




-0.08 


0.10(p=0.437) 


-0.07 


0.10(p=0.490) 






ESV 




-0.22 


0.11 (p=0.104) 


-0.13 


0.11 (p=0.214) 






Level 2 


YSR 








3.35 


0.87 (p=0.031) 






UER 








-1.79 


10.24 (p=0.872) 






CES 








13.01 


4.00 (p=0.048) 






Interaction 


STSVxYSR 












0.07 


0.02 (p=0.032) 


Random effect 


6 2 X 2 


S 2 


x 2 


6 2 


x 2 


S 2 


t 


Level 2, uo 
Level 1, r 


127.98 

41.91 (p<0.001) 

256.61 


135.34 
213.73 


50.32 (p<0.001) 


9.11 
226.29 


4.83 (p=0.183) 


45.85 
227.33 


17.10 (p=0.005) 


STSV 




0.20 


12.41 (p=0.053) 


0.07 


16.40 (p=0.012) 


0.04 


9.91 (p=0.077) 


DSV 




0.02 


5.29 (p>0.500) 










ESV 




0.08 


2.52 (p>0.500) 










ICC 


0.332 




0.387 








0.168 


Deviance 


710.25 




707.12 




685.79 




707.33 



STSV, stress-related search volume; DSV, drinking-related search volume; ESV, exercise-related search volume; YSR, yearly suicide rate; UER, unemploy- 
ment rate; CES, celebrity suicide; ICC, interclass correlation coefficient. 



This study therefore rejected the null hypothesis of X 2 , which 
states that suicide-related searches would vary the same 
amount across years as the monthly averages for suicide-re- 
lated searches varied for a single year. In the model, devi- 
ance was revealed to be 710.25. 

Generally, if ICC is greater than 0.05, one can suppose 
that there is an intergroup variation, and even if ICC is less 
than 0.05, a multi-level analysis can be conducted when 
there is an experiential research result regarding intergroup 
variation. 37 This result supports the idea that analyzing a 
multi-level model is valid if all the variables are entered at 
monthly and yearly levels; nevertheless, although SSV is 
affected by monthly factors, the influence of yearly factors 
cannot be ignored. 

Model 2 was performed to address research question 2. 
The effects of monthly factors on SSV were estimated 
through fixed effects, and the random effects were analyzed 
to see if these individual factors showed differences by year. 
As a result of fixed effects, drinking-related search volume 
(DSV) and exercise-related search volume (ESV) appeared 
not to have any effects on SSV, but stress-related search vol- 
ume (STSV) appeared to have some effect (R=0.38,/?= 
0.071). As a result of random effects, STSV was statistically 



significant (X 2 =12.41, /?=0.053), and there were differences 
by year (x 2 =50.32,/?<0.001). Therefore, the necessity of en- 
tering yearly variables was supported. In other words, SSV 
increased as STSV increased monthly, and this effect showed 
that there were differences by year. The ICC for yearly SSV 
was calculated as 0.39 and the deviance was 707. 12. 

Model 3 was designed to test research question 3, and an- 
alyzed SSV by including both yearly and monthly factors 
in the model. This model also included DSV and ESV, 
which were processed as fixed unknowns in the analysis 
because they were not significantly associated with STSV. 
As a result of fixed effects regarding the suicide search re- 
sults, none of the monthly factors was statistically signifi- 
cant. Yearly suicide rate (YSR) appeared to affect SSV to a 
statistically significant degree (P=3.35, j?=0.031). In other 
words, increases in YSR indicate increases in SSV. The 
number of suicides by a celebrity for a year showed a statis- 
tically significant effect on SSV (R=13.01,/?=0.048). How- 
ever, unemployment rate by year did not. As a result of ran- 
dom effects, although STSV was statistically significant 
(X 2= 16.40, />=0.012), it did not show a difference by year 
(X 2 =4.83,£H).183). In the model, deviance was 707.12. 

Model 4 was designed to test the interaction of STSV as 
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a monthly factor and YSR as a yearly factor. The result was 
statistically significant (P=0.7,/?=0.032). This result indicat- 
ed that there is a difference in the relationship between 
stress-related search volume and suicide-related search vol- 
ume according to YSR. In other words, if there are a large 
number of stress-related searches, the number of suicide-re- 
lated searches increases as well. This result also demon- 
strated that YSR has the effect of increasing the number of 
suicide-related searches. The ICC of yearly SSV was 0.168, 
and its deviance was 707.33. 

Thus far, statistical tests were conducted on factors af- 
fecting SSV via a multi-level analysis based on the stress- 
vulnerability model. The fact that stress-related searches af- 
fected suicide-related searches was supported in this study, 
but there remained a need to further define this result and 
provide evidence for a good circular model: suicide -related 
searches affect stress-related searches and stress-related 
searches affect suicide -related searches. Therefore, another 
analysis was conducted to check for differences by year in 
relation to the effect of monthly suicide-related searches on 
stress-related searches (Table 4). 

In the fixed effect of Model 1, the total average number of 
stress-related searches was 113.67, and significant variations 
existed by year, given the significant variances at monthly 
and yearly levels (r=22.62, ^<0.001). ICC was 25.3%, and 
there were differences in STSV by year, as there was a signif- 
icant variance at the year level (X 2= 30.42,/?<0.001). 

As a result of the fixed effect analysis regarding STSV in 
Model 2, STSV was significantly different by year (t=22.62, 
^kO.001), but SSV did not appear to affect STSV. This sup- 
ports the validity of the one-way model: that is, stress-related 
searches increase suicide-related searches, but not vice versa. 
Yet in the results from the random effect analysis, SSV ap- 

Table 4. Multi-Level Model Analysis of Stress-Related Searches 



peared to affect stress-related searches (X 2 =15.29, />=0.018). 
There was a difference in STSV by year (X 2 =33.69,/?<0.001). 
The ICC of yearly suicide search numbers was calculated 
to be 0.28, and the deviance was 761.69. 



DISCUSSION 



The ultimate goal of this study was to analyze the determi- 
nants of searches concerning suicide utilizing big data. To 
this end the stress-vulnerability model was employed as a 
theoretical basis for this study. As shown in this study, the 
suicide rate of Korea and the volume of suicide-related 
searches on Google showed similar trends, and further dem- 
onstrated that copycat suicides may result from broadcasting 
celebrity suicides. This finding was consistent with previous 
studies that have reported the serious impact of suicide-re- 
lated press reports. 25 - 27 

Recent suicide rates and suicide-related search numbers 
in the major OECD countries have either remained stable 
or decreased, whereas Korea's suicide rate has risen to al- 
most three times the average suicide rate of other OECD 
members. Based on the results of this study, Koreans con- 
duct suicide-related searches on Google 55.2 times a month 
on average. Moreover, suicide-related searches and stress- 
related searches showed a year-by-year difference. 

In the analysis to test the effects of stress, drinking, and 
exercise-related searches on suicide-related searches, stress- 
related searches appeared to have the greatest influence on 
suicide-related searches. In other words, more stress-related 
searches were associated with more suicide-related search- 
es. This result is consistent with previous studies, reporting 
that stress directly affects suicide. 1214 17 20 Therefore, it is 



Model 


Parameter 




Model 1 
Unconditional model 






Model 2 
Unconditional Slope model 


Fixed effect 


Coef. 


S.E. 


t-ratio 


Coef. 


S.E. 


t-ratio 


Level 1 


Intercept, Yoo 


113.67 


5.02 


22.62 (p<0.001) 


113.67 


5.02 


22.62 (p<0 .001) 


SSV 








0.07 


0.24 


0.286 (p=0.785) 


Random effect 


SD 


S 2 


2 

T 


SD 


6 2 


f 


Level 2, Uo 
Level 1, r 


12.86 
22.09 


165.49 
488.02 


30.42 (p<0.001) 


13.02 
20.99 


169.45 
440.64 


33.69 (p<0.001) 


SSV 








0.52 


0.27 


15.29 (p=0.018) 


ICC 




0.253 






0.278 




Deviance 




761.68 






761.69 





SSV, suicide-related search volume; ICC, interclass correlation coefficient. 
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possible that, when there is a buzz or when many keyword 
searches related to stress or suicide are on search portals or 
social network services, suicide impulses could be curtailed 
by analyzing various types of data, including the age and 
search patterns of an individual user, in order to provide 
timely intervention. 

Additionally, although monthly factors did not affect the 
number of suicide-related searches, the actual suicide rate 
and yearly factors strongly affected it. This result may be 
attributable to the fact that monthly factors were affected by 
controlling yearly suicide rate. It also can be interpreted that 
yearly suicide-related factors directly affected suicide-relat- 
ed searches. The interactions between stress-related search- 
es and suicide rates were significant: a large number of 
stress-related searches in a year reporting high suicide rates 
were shown to affect suicide-related searches. 

There are few previous studies on suicide risk factors uti- 
lizing big data. From this study, it was revealed that suicide 
searches and suicide rates were interrelated. Therefore, it 
may be possible to set up a systematic plan for preventing 
suicide at the governmental level. For instance, in Finland 
governmental departments made an comprehensive effort to 
prevent suicide by implementing a national project (1986- 
1996) based on 1397 psychological autopsy reports related to 
suicide (including medical and social security data, as well as 
family, friends, and doctor interview materials, etc.). Follow- 
ing its implementation, the suicide rate in Finland was found 
to be decreasing as follows: 30.3 suicides per 100000 people 
in 1990, 20.4 suicides in 2004, and 17.3 suicides in 2010. 
Considering that the decrease in Finland's suicide rates was 
based on a review of offline reports at the government level, 
Korea may also now be in a position to utilize big data to set 
up a nationwide suicide prevention plan. 38 

For the health and welfare sector in Korea, governmental 
and public institutions are already managing many different 
types of big data; however, their utilization and application 
are still in the initial steps. To present efficient strategies for 
preventing suicide and providing tailored service by the use 
of big data, we suggest the following: first, collaborative 
and systematic approaches should be developed between 
governmental departments and private institutes to manage 
health and welfare-related big data in an integrated way. 
Currently, in Korea, the National Health Insurance Corpo- 
ration, the Korean Food & Drug Administration, and other 
national research institutes manage big data. Such data are 
mainly collected and stored through web search portals run 
by private institutes or social network services. In the soci- 



ety, therefore, classifying data-driven information and se- 
curing personal data should be also considered to prepare 
another societal challenge. 

Second, there is an urgent need to further develop a way 
of analyzing and processing big data in health and welfare 
sectors. To promote greater development, education pro- 
grams are needed to train professionals in data manage- 
ment, so called "data scientists," in an effort strengthen 
their ability to find hidden information from large-scale, un- 
structured big data. Also, development of clouding comput- 
er services for which to store, classify, and analyze non-re- 
lational and atypical data should be at the core of this whole 
process. Additionally, preferential attention should be given 
to the development of relevant technology that can enable 
"collection^storage^analysis— ^deduction" of big data in 
the sectors of health and welfare. 

Third, a policy should be set in place for data security and 
privacy protection regarding personal information while uti- 
lizing big data. However, at this point, relevant laws and 
systems are next to non-existent and are not even discussed 
in Korea. Therefore, for the prevention of privacy related 
crimes such as a cyber-civil-rights violation, strict controls 
on data processing, information access, and anonymity as- 
surance must accompany the use of big data. To this effect, a 
policy should be established to ensure the balance of utiliz- 
ing and protection of big data in health and welfare sectors. 

For the prevention of suicide, the policy implications of 
this study are as follows: first, age-friendly intervention pro- 
grams for suicide prevention, such as school- wide programs 
for youth, workplace programs for workers, and aging- 
friendly programs for the elderly, warrant development. Sec- 
ond applications such as a "Respect for Life Online Gate- 
keeper" should be developed to reduce suicide-associated 
risks as revealed through the analysis of big data and to pro- 
vide tailored programs in real time if the warning signs of 
suicide are detected. Third, suicide prediction models by re- 
gion such as "Respect for Life Prediction System" should 
be developed to prevent suicide in real time, by analyzing 
suicidal behaviors by region. 

Finally, the limitations of this study in the following: first, 
rather than analyzing individual characteristics, this study 
analyzed entire groups as a whole; therefore, applying the 
results to individuals may result in ecological errors. Second 
this study used data obtained from Google search trends sta- 
tistics, so its representativeness may be questioned. There- 
fore, it is recommended that big data also be collected 
through various channels to analyze factors related to search- 
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es on suicide, such as other web search portals, blogs, Inter- 
net cafes, various types of social network services, web 
boards, etc. Third, suicide BUZZ via the Internet shows a 
tendency to spread rapidly for the first week after its onset 
and has about the life cycle of three weeks. 39 Employing a 
time lag of one to three weeks for the analysis is therefore 
necessary; however, this study did not apply such a time 
lag. Future research should consider this issue. Fourth, this 
study analyzed search results obtained from unspecified in- 
dividuals; therefore, it would be impossible to predict sui- 
cidal behavior therefrom. Future research should consider 
this issue by analyzing social big data including emotions 
and psychological behaviors related to suicide. Fifth, in this 
study, unemployment rate by year was not statistically sig- 
nificant in the model. Nevertheless, this result warrants fur- 
ther study. 

In conclusion, this study utilized big data to analyze sui- 
cide related factors and was ultimately intended to suggest 
the use of big data, which exists in various forms in differ- 
ent areas of society, to establish suicide prevention plans at 
the national level. Training talented professionals and pre- 
paring related infrastructure should be systematically im- 
plemented so that big data can be employed for various 
purposes in health and welfare sectors in the future. 
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